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© In a parallel data processing system in which 
processor elements (PE) are arranged in a two- 
dirriensional grid form, each PE includes 1-bit 
arithmetic means for 1 -bit operand data, storage 
means for storing operand data and/or the result 
and communication means for effecting commu- 
nication with other PEs. A common bus for con - 
necting PEs in a transverse (row) direction is dis- 
posed tor each PE in a longitudinal (column) direc- 
tion, or data transfer routes for connecting PEs in the 
transverse (row) direction are disposed, so as to 
effect communication between PEs of different col- 
umns. The PE in the longitudinal (column) direction 
is used for 1 -word storage and 1 -word operation, 
for example, and parallel operation is effected for 
each PE of each column. The present invention 
provides such a parallel data processor and parallel 
data processing method. 
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Technical Field 

The present invention relates to a data pro- 
cessing system and, more specifically, to a data 
processor for and a data processing method of 
performing parallel data processing at high speed. 

Background Art 

Data processors adapted for parallel process - 
ing have been developed recently in order to 
speed up data processing. Fig. 1 illustrates a prior 
art of such data processors. In Fig. 1A. a data 
processor 5 is constructed from an instruction fetch 
unit 1 for receiving an instruction externally input 
over a bus 6, an instruction decode unit 2 for 
decoding the received instruction, an operation 
execute unit 3 for reading operands out of a reg - 
ister file 4 and then performing arithmetic based on 
the received instruction, and the register file 4 for 
storing the result of the execution. 

The processes of receiving the instruction from 
the bus 6. decoding it performing arithmetic, and 
storing the result of the arithmetic in the register 
file 4 in Fig. ia can be indicated by a four-stage 
pipeline operation as shown in Fig. IB. That is, in 
the first prior art the pipeline processing consists 
of four stages: an instruction fetch stage; an in- 
struction decode and operand fetch stage; an in- 
struction execute stage; and a result store stage. In 
the case of this system, therefore, parallel pro- 
cessing can be performed by providing the pro- 
cessor with a plurality of arithmetic perform units 
and a register file having multiple ports. However, 
since the number of storage locations in the reg - 
ister file 4 is larger than the number of the 
arithmetic units 3, it takes long to control which of 
the storage locations is to be accessed by an 
arithmetic unit 3. In addition, the number of bus 
lines which connect the arithmetic units 3 and the 
register file 4 becomes very large. For example, if 
three arithmetic units for 32 -bit arithmetic oper- 
ations are provided, then as many as 32 x 2 (two 
sets are needed for read and write) x 3 3 192 bus 
lines will be required however simply it is consid - 
ered. Further, the routing of the bus lines becomes 
complicated, making an integrated circuit version 
difficult If, on the other hand, the plural arithmetic 
units 3 and the register file 4 are connected by a 
common bus, then a large amount of data will flow 
through the common bus, so that von Neumann 
bottleneck occurs. Thus, there is a problem in that 
the instruction execute stage and the result store 
stage become slowed. 

Fig, 2 illustrates a second prior art of data 
processors, which uses a logic* in memory system 
in which the arithmetic facility and the storage 
facility are integrated on the same chip and per- 



forms serial -by -bit arithmetic. The chip is com - 
posed of, say. 64K (2 1 *) basic gate cells each 
comprising a 4K (2 12 )-bit external memory 7, a 
serial arithmetic and logic unit (ALU) 8, and an 
5 internal flag register 9. All the buses are 1 bit in 
width. 

In Fig. 2. two pieces of data A and B stored in 
separate locations in the external memory 7 are 
read out as input data from the external memory 7 

io to the ALU 8, arithmetic is performed by the ALU 
8. and the result is stored in the external memory 7 
again. The flag register 9 generates a condition 
code for arithmetic to be performed by the ALU 8 
and is used, for example, to store an overflow bit 

is and a carry output and re-enter a carry-in to the 
high -order bit into the ALU 8 when the resutt of 
arithmetic by the ALU 8 causes overflow. 

In performing 32-bit arithmetic processing, by 
way of example, the prior art of Fig. 2 requires that 

20 a process of reading data to be operated on from 
the external memory 7 be performed 32 times and 
a process of writing the resutt into the memory be 
performed 32 times. Thus, a problem arises in that 
communication time between the external memory 

25 7 and the ALU 8 becomes long, making speeding 
up of data processing impossible. 

Disclosure of Invention 

30 It is an object of the present invention to pro - 
] vide a processor element which performs high- 
> speed data communication with another processor 
element while circumventing von Neumann bot- 
tleneck and performs parallel data processing using 

35 the result of the communication, and an architec - 
ture of a data processing system using such pro - 
cessor elements. 

It is the other object of the present invention to 
: provide a parallel data processor suitable for a 

40 semiconductor integrated circuit version. 

Fig. 1 is a block diagram illustrating fun- 
damentals of a processor element as a data pro - 
cessor of the invention. As shown, a processor is 
equipped with a 1 -bit arithmetic means 11 for 

45 performing arithmetic on 1 -bit data to be operated 
on, e.g., a 1 -bit arithmetic unit storage means 12 
for storing the data to be operated on and the 
resutt of the arithmetic by the 1 -bit arithmetic 
means 1 1 , e.g., a 1 -bit memory unit and com - 

50 munication means 13 for permitting communication 
between each of the 1 -bit arithmetic means 11 
and the storage means 12 and another processor 
element In Fig. 3, the storage means 12 is incor- 
porated into the processor element to construct a 

55 togic-in memory. Since the output of the i -bit 
arithmetic means 11 is connected to the storage 
means 12. consisting of. for example. 1 - bit 
memory, directly with no intervening bus, it be- 
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comes unnecessary to spend selecting a location 
to be written in within the storage means 12. 
Therefore, when the processor element performs 
pipeline operation, the' result store (write) stage can 
be shortened considerably, improving arithmetic 5 
speed of the processor element. 

Fig. 4 is a diagram for use in explanation of a 
concept of data processing in a parallel data pro- 
cessing system in which processor elements, each 
configured as shown in Fig. 3. are arranged in a 10 
matrix. In the figure, suppose that 32 processor 
elements are arranged in each of the top-to- 
bottom and left-to-right directions and thus the 
total number of processor elements is 1024. Sup- 
pose that the processor elements arranged in the is 
left -to -right direction are connected by. for ex- 
ample, a common data bus not shown, while the 
processor elements arranged in the top-to-bot- 
tom direction' are connected by. for example, a 
data transfer One for transferring a carry, etc. Fur- 20 
ther. suppose that each of the processor element is 
connected to an I/O bus for inputting/outputting 
data from the outside of the system. 

In Rg. 4, the processors in the top-to-bottom 
direction store, for example. 1-word data and 2s 
perform arithmetic on that data. For example, each 
of the elements in a column stores a corresponding 
bit of 32-bit data, with the least significant bit 
stored in the lowest element in the column, and 
performs arithmetic oh the stored bit 30 

In Rg. 4, from each of the processor elements 
PE1 in the leftmost column to a corresponding one 
of the 32 processor elements PE 2 in the second 
column from the left is transferred 1 - bit internal 
data D1 stored in each processor using the com- 35 
mon data bus not shown or inter- processor - 
element data transfer, or data 02 which is exter- 
nally input over the I/O bus. Each of the elements 
PE2 performs arithmetic on the transferred data 01 
or 02 and 1 -bit data 0 stored in it The result is <o 
stored in the storage means 12 in each of the 
elements PE2. Alternatively, the result of the 
arithmetic may be output to outside over the I/O 
bus as required. 

In Fig. 4, when complicated processing re- 45 
quiring a pipeline process to be performed a large 
number of times is performed, the pipeline process 
can be speeded up by repeating, a required num - 
ber of times, combined processes of fetching an 
externally input instruction, decoding it. transferring 50 
necessary data to each processor according to the 
result of the instruction decoding, performing 
arithmetic, and storing the result. Further, since 
processor elements each comprising 1-bit 
anthmetic means and storage means are arranged 55 
,n 016 form of an array, the degree of parallelism 
can be much improved by performing parallel 
arithmetic for each word. 



Since, as shown in Fig. 4, the arrangement of 
the processor elements and wiring among the 
processor elements is made regular, the data pro - 
cessor according to the present invention is suit- 
able for an integrated circuit version. 

Brief Description of Drawings 

Fig. 1A, B is a diagram explanatory of a first 

prior-art data processor; 

Fig. 2 is a diagram explanatory of a second 

prior -art data processor; 

Rg. 3 is a block diagram illustrating fundamen - 

tals of a processor element as a data processor 

of the present invention; 

Fig. 4 is a diagram illustrating the concept of 
data processing in a parallel data processing 
system using the processor element of the 
present invention; 

Fig. 5 is a block diagram of a first embodiment 
of the parallel data processing system; 
Fig. 6 is a detailed block diagram of the pro- 
cessor element of Rg. 5; 
Rg. 7 is a block diagram of an embodiment of a 
PE controller 

Rg. 8 is a block diagram of a second embodi- 
ment of the parallel data processing system; 
Rg. 9 is a detailed block diagram of the pro- 
cessor element of Rg. 8; 
Rg. 10 is a flowchart of an embodiment of the 
parallel data processing of the present invention; 
Rg. 11 is a diagram for a supplemental ex- 
planation of the flowchart of Rg. 10; 
Rgs. 12A, 12B and 12C are diagrams for use in 
explanation of a pipeline process of the present 
invention; and 

Rg. 13 is a diagram of a concrete example of 
arithmetic processing in an embodiment of the 
present invention. 



Best Mode of Carrying Out the Invention 

Rg. 5 is a block diagram of a first embodiment 
of a parallel data processing system according to 
the present invention. In the figure, processor eto - 
ments 10 are arranged in the top-to-bottom and 
left -to -right directions to form a matrix. The 
top-to-bottom and left-to-right directions cor- 
respond to one word. For example, when one word 
is 32 bits, 32 processor elements 10 are arranged 
in a column. The processor elements 10 in each 
column permit parallel arithmetic. In the left - to - 
right direction a common data bus 17 is provided 
for processor elements (PE) in each row. In the 
top-to-bottom direction the PEs 10 are con- 
nected to each other by a data transfer line 18 for 
transferring a carry, etc., as will be described later. 
In common with the parallel processor elements 
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required to control storage locations in the register 
file, as required in the prior art, the common bus is 
not used on a time -sharing basis in transferring 
the results of arithmetic from the arithmetic units to 
the register file, and data transfers are not delayed. 
In addition, the PEs themselves are capable of not 
only high -speed operation but also parallel op- 
eration when arranged in an array, thus much in- 
I creasing the speed of processing. 

Fig. 11 is a diagram for use in supplemental 
explanation of the flowchart of Ftg. 10. In the figure; 
each controller 30 sends various control signals to 
processor elements in the top-to-bottom direc- 
tion, that is, to a processor element array 20. in 
accordance with the result of decoding of an in - 
struction by the instruction decode unit 16. Each 
1 - bit arithmetic unit 21 comprising the processor 
element array 20 performs arithmetic using input 
data DIN to be operated on which is externally 
input over the I/O bus 19. The result is output to 
outside via the 1 - bit memory unit 22 as arithmetic 
result data OOUT. 

Rgs. 12A, 12B and 12C are diagrams for use 
in explanation of stages of pipeline processing ac- 
cording to the present invention which is performed 
in the processor elements PE1, PE2 and PE3 ar- 
ranged in the direction of a row. In the figure, data 
transfer (T) is pipeline -processed subsequent to 
the two stages of instruction fetching (F) and de- 
coding (0). Control signals from the instruction 
decode unit are applied to the processor elements 
PE1, PE2 and PE3 in parallel, so that the execution 
of arithmetic (E) and the writing of the result (W) 
are processed in parallel in the processor ele- 
ments. Since the data transfer (T) time after the 
execution (E) can almost be neglected, the writing 
(W) of the result can be performed very fast 

Next, in the present invention, the processor 
elements arranged in a two-dimensional matrix 
with n rows and m columns can substantially be 
split into groups of processor elements for re- 
spective independent arithmetic operations. 

Fig. 13 is a diagram for use in explanation of 
an embodiment of such split usage of a parallel 
data processing system, which illustrates provision 
of more than one common data bus 55 for pro- 
cessor elements arranged from left to right That is, 
this embodiment can be considered as including 
more than one common data bus in the embodi - 
ment of the first parallel data processing system 
(Pig. 5) in which one common data bus 17 is 
provided for processor elements in the left - to - 
right direction. 

In Ftg. 13. there are shown, for simplicity, three 
processor elements PE1. PE2 and PE3 in the left- 
to -right direction and only one processor in the 
top -to -bottom direction. However. 32 elements 
exist both in the left -to -right direction and in the 



top -to -bottom direction. The PEs in the top- 
to- bottom direction constitute one word. Each 
element (PE) is shown, for simplicity, as compris - 
ing an arithmetic unit 51 corresponding to the 1 - 

s bit arithmetic means of Ftg. 3, a storage unit 52 
corresponding to the storage means, and registers 
53 and 54 which temporarily store operand data 
transferred from other PEs over the common data 
buses 55 and whose stored contents are read out 

io by high -speed clocks at arithmetic time. In this 
example, the communication units in Fig. 6 are 
omitted. In addition, although a plurality of common 
data buses are shown above (N) and below (S) the 
PEs, as the common data buses 55. this is not 

15 restrictive. 

It is assumed that the system is split for use, 
and the common data buses 55 are larger in 
number than is necessary to an instruction to per- 
form simultaneous parallel arithmetic. 

20 In Fig. 13, for example, data "A" is transferred 
from the storage section 52 in PE1 on the left (W) 
to the register 53 in PE2 over the upper common 
data bus 55. And data *B" is transferred from the 
storage section 52 in the PE3 on the right (E) to 

29 the register 54 in the PE2 over the lower common 
data bus 55. For example, arithmetic "A + B* is 
performed by the arithmetic unit 51 in the PE2 and 
then the result is stored in the storage unit 52 in 
thePE2. 

30 At the same time as such operations are per- 
formed by the PE1 to PE3. PE4 to PE6 (assumed 
to be to the right of the PE3), not shown, can 
execute addition of two pieces of data "C" and "0" 
in exactly the same manner. In this case, out of the 

39 common data buses 55. buses that are not used by 
the PE1 to PE3 are used by the PE4 to PE6. which 
substantially splits the parallel data processing 
system for use and ensures very efficient system 
usage. 

40 When the processor elements, as shown in Fig. 
9, are arranged in an array as shown in Ftg. 8, the 
processor elements can be split in word units and 
operated in parallel in the same manner as de- 
scribed above by controlling the output commu - 

45 nication units for the E and W directions. 

According to the present invention, naturally 
the processor element shown in Ftg. 6 which uses 
common data buses and the processor element 
shown in Ftg. 9 which uses an inter -processor 

so element transfer path in the direction of row may 
be combined into one processor element, and the 
resulting processor elements may be arranged in 
an array. 

55 Industrial Applicability 

According to the present invention, the result 
writing (W) stage can be speeded up in each PE. 
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and the degree of parallelism can be improved by 
. arranging PEs in an array to much increase the 
speed of data arithmetic operation. Moreover, the 
device of the present invention is suitable for a 
one -chip integrated circuit version because com - 
mon data buses and inter -PE transfer paths are 
able to be wired regularly in the left-to-right and 
' the top-to-bottom direction and PEs can be 
j formed in the same pattern. Furthermore, since the 
j need for a large number of bus lines to be driven 
to transfer data as in a conventional parallel data 
processor is eliminated, the processing speed also 
improves. 

Accordingly, the data processors of the present 
invention can be used in various types of parallel 
data processing systems such as an image pro- 
cessing system, etc. 

Claims 

1. A processor element characterized by com- 
prising: 

arithmetic means (11) for performing 
arithmetic on data to be operated on; 

storage means (12) for storing the data to 
be operated on or the result of the arithmetic; 

communication means (13) for permitting 
communication between said arithmetic means 
(11) and another processor element; and 

access means for accessing said storage 
means (12) independently of said commu- 
nication means. 

2. A processor element according to claim 1, 
characterized in that said arithmetic means is 
one bit. 

3. A processor element according to claim 1, 
characterized in that said storage means is one 
bit 

4. A processor element according to claim 1, 
characterized in that said storage means stores 
data of one or more bits to be operated on 
and/or the result of the arithmetic. 

5. A processor element according to claim 1, 
characterized in that said storage means (12) 
stores data to be operated on as a result of a 
communication process of said communication 
means (13) or data to be operated on which is 
externally input without being routed through 
said communication means (13). 

6. A processor element according to claim 1, 
characterized in that said communication 
means is connected to a data transfer line for 
transferring a carry and to a common bus. 



7. A processor element according to claim 1, 
characterized in that said communication 
means is connected to a data transfer line for 
transferring a carry transfer line and a data 

i 5 transfer tine to an adjacent processor element. 

i 

i 

j & A processor element comprising: 
\ an arithmetic unit; 

a memory unit connected to said 
10 arithmetic unit 

a communication unit for selectively by- 
passing communication between said 
arithmetic unit and another processor element 
and data; and 

« an interface for accessing said memory 

unit independently of said communication unit 

9l A data proces so r having proce sso r elements 
arranged in a matrix characterized in that 
20 each of said processor elements com- 

prises: 

. arithmetic means (11) for performing 
arithmetic on data to be operated on; 

storage means (12) for storing the data to 
25 be operated on or the result of the arithmetic; 

communication means (13) for permitting 
communication between said arithmetic means 
(1 1) and another processor element and 

access means for accessing said storage 
30 means (12) independently of said commu- 
nication means, 

a signal generated by each of said pro- 
cessor elements being transferred in the di- 
rection along a column and common buses 
35 being provided in the direction along a row to 

thereby execute arithmetic operation in par- 
allel. 

10. A data proce ss or according to claim 9. char- 
40 acterized by providing at least as many com - 

mon data buses as there are simultaneous 
parallel arithmetic operations along first sec - 
ond and third processor elements arranged in 
a row, and performing an arithmetic operation 

45 in parallel which performs arithmetic on the 

contents of said storage unit of said first pro - 
cessor element and the contents of said stor- 
age unit of said second processor element in 
said arithmetic unit of said third processor 

so element and stores the result of the arithmetic 
in said storage unit of said third processor 
element 

11. A data processor according to claim 9, char- 
55 acterized in that said processor elements ar- 
ranged in a matrix are assembled into one 
chip. 
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12. A data processor according to claim 9. char- 
acterized in that said parallel arithmetic op- 
eration is performed by causing each column 
of processor elements to perform arithmetic 
corresponding to one instruction fetched in 
parallel. 

13. A data processor according to 9. characterized 
in that said parallel arithmetic operation is 
performed on a pipeline basis. 

14. A data processor according to claim 9, char- 
acterized in that one controller is provided for 
each group of processor elements arranged in 
a column, and said controller comprises a 
common bus controller for determining 
whether the result of arithmetic is to be output 
onto a common bus line, an arithmetic con- 
troller for controlling what arithmetic is to be 
performed by said arithmetic unit and a 
memory I/O controller for performing control 
between a memory and an I/O bus. 

15. A data processor having processor elements 
arranged in a matrix characterized in that 

each of said processor elements com- 
prises: 

arithmetic means (11) for performing 
arithmetic on data to be operated on; 

storage means (12) for storing the data to 
be operated on or the result of the arithmetic; 

communication means (13) for permitting 
communication between said arithmetic means 
(1 1 ) and another processor element; and 

access means for accessing said storage 
means (12) independently of said commu- 
nication means. 

a signal generated by each of said pro- 
cessor elements being transferred in the di- 
rection of a column and data being transferred 
in the direction of a row to thereby execute 
arithmetic operation in parallel. 

16. A data processor according to claim 15, char- 
acterized by providing at least as many com - 
mon data buses as there are simultaneous 
parallel arithmetic operations along first, sec - 
ond and third processor elements arranged in 
a row, and executing an arithmetic operation in 
parallel which performs arithmetic on the con - 
tents of said storage unit of said first processor 
element and the contents of said storage unit 
of said second processor element in said 
arithmetic unit of said third processor element 
and stores the result of the arithmetic in said 
storage unit of said third processor element. 



17. A parallel data processing system character- 
ized by comprising: 

n x m unit data processor elements com - 
prising m sets of n unit data processing ele - 
5 ments connected by a data transfer path, 

n common data buses (17) each for m 
corresponding unit data processor elements in 
m sets of n unit data processor elements; 

an instruction fetch unit (15) for fetching an 
io instruction; 

an instruction decode unit (16) for decod - 
ing the instruction fetched by said instruction 
fetch unit (15); and 

m processor element controllers (30) for 
75 outputting a control signal for data processing 
to each of n unit data processor elements in 
each of said m sets according to the instruc - 
tion decoded by said instruction decode unit 
(16). 

20 

18. A parallel data processing system according to 
claim 17. characterized in that each of said 
data processor elements comprises: 

a 1 -bit arithmetic unit (21) for performing 
25 arithmetic on 1- bit data to be operated on; 

a 1 -bit memory unit (22) for storing an 
output of said 1 -bit arithmetic unit (21) or data 
input from the outside of the system and out- 
putting the stored content to said 1-bit 
jo arithmetic unit (21); 

a read/write interface (24) for controlling 
data input/output between said 1 -bit memory 
unit (22) and the outside of the system; 

an input communication unit (23) for in - 
as putting data other than a carry input which is 
input from said common data buses (17) or an 
adjacent unit data processor element of said n 
unit data processor elements to said 1-bit 
arithmetic unit (21) and inputting a carry input 
40 from said adjacent data processor element to a 
carry input terminal (23) of said 1 - bit 
arithmetic unit (21); 

an output communication unit (25) for se- 
lectively outputting a carry output of said 1 - 
45 bit arithmetic unit (21) or an output of said 1 - 
bit memory unit (22) to said adjacent data 
processing element and 

a tri- state buffer (27) for outputting an 
output of said 1 -bit memory unit (22) to said 
50 common data buses (17). 

19. A parallel data processing system according to 
claim 17. characterized in that each of said m 
processor element controllers (30) for each of 
55 m sets of n data processor elements com- 
prises: 

a memory controller (33) for outputting a 
data input/output control signal to said 
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read/write interface (24); 

an arithmetic controller (32) for outputting 
an arithmetic control signal to said 1-bit 
arithmetic unit (21) and said input commu- 
nication unit (23); and 5 

a data transfer controller (31) for outputting 
a data , transfer control signal to said input 
communication unit (23). an output commu- 
nication unit (25) and said tri- state buffer 
(27). 1Q 

20. A parallel data processing system according to 
claim 17, characterized in that each of said m 
sets of unit data processing elements are in 
charge of processing data in word units, and ;s 
processes (S42 to S44) of. according to an 
instruction fetched by said instruction fetch unit 
(15) and decoded by said instruction decode 
unit (16). performing necessary data transfers 
between unit bit -corresponding data pro- 20 
cessing units in said m sets of unit data pro- 
cessing units (S42). performing arithmetic by 
each unit data processing unit (S43), and 
storing the result of arithmetic in storage 
means (S44) are performed a required number 25 
of times according to the contents of the in- 
struction decoded. 

21. A parallel data processing system according to 
claim 17, characterized in that in place of one 30 
common data bus for each set of m cor- 
responding unit data processor elements in 

said m sets of n unit data processor elements, 
at least a plurality of (I) common data buses 
required at a time of execution of simultaneous 35 
parallel arithmetic instruction are provided, and 
the total number of n x I common data buses 
(55) are provided. 

22. A parallel data processing system comprising: 40 

n x m unit data processor elements ar- 
ranged in a two-dimensional matrix with n 
columns and m rows, said elements being 
connected by first m data transfer lines in the 
direction along a column and by second n data 45 
transfer lines in the direction along a row; 

an instruction fetch unit (15) for fetching an 
instruction; 

an instruction decode unit (16) for decod - 
ing the instruction fetched by said instruction so 
fetch unit (15); and 

m processor element controllers (30) each 
for outputting a control signal for data pro- 
cessing to n unit data processing units in a 
corresponding one of m columns. 55 

23. A parallel data processing system according to 
claim 22. characterized in that each of said 



data processor elements comprises: 

a 1 - bit arithmetic unit (21) for performing 
arithmetic on 1 - bit data to be operated on; 

a 1 -bit memory unit (22) for goring an 
output of said 1 - bit arithmetic unit (21) or data 
input from the outside of the system and out - 
putting the stored content to said 1-bit 
arithmetic unit (21); 

a read/write interface (24) for controlling 
data input/output between said 1 - bit memory 
unit (22) and the outside of the system; 

an input communication unit (23) for in- 
putting data other than a carry input which is 
input from an adjacent unit data processor 
element of n unit data processor elements in 
the direction along a column or m unit data 
processor elements in the direction along a 
row to said 1-bit arithmetic unit (21) and 
inputting a carry input from said adjacent data 
processor element to a carry input terminal 
(23) of said 1 - bit arithmetic unit (21); 

an output communication unit (25) for se- 
lectively outputting a carry output of said 1 - 
bit arithmetic unit (21) or an output of said 1 - 
bit memory unit (22) to said adjacent data 
processing element in the direction along a 
column; and 

second and third output communication 
units (35, 36) for selectively outputting data 
input from an adjacent unit data processor 
element of m data processor elements in a row 
or an output of said 1 -bit memory unit (22) to 
said adjacent unit data processor elements in 
the direction opposite to said adjacent unit 
data processor element of m -data processor 
elements in a row. 

24. A parallel data processing system according to 
claim 22, characterized in that each of m PE 
controllers (30) for m sets of n unit data pro- 
cessor elements comprises a memory con- 
troller (33) for outputting a data input/output 
control signal to said reaoVwrite interface (24), 
an arithmetic controller (32) for outputting an 
arithmetic control signal to said 1-bit 
arithmetic unit (21) and said input commu- 
nication unit (23), and a data transfer controller 
(31) for outputting a data transfer control signal 
to said input communication unit (23) and said 
first, second and third output communication 
units (25. 35. 36). 

25. A parallel data processing system according to 
claim 22. characterized in that each of said m 
sets of n unit data processing elements are in 
charge of processing data in word units, and 
processes (S42 to S44) of. according to an 
instruction fetched by said instruction fetch unit 
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(15) and decoded by said instruction decode 
unit (16). performing necessary data transfers 
between bit -corresponding data processing 
units in said m sets of unit data processing 
units (S42), performing arithmetic by each unit s 
data processing unit (S43). and storing the 
result of arithmetic in storage means (S44) are 
performed a required number of times ac- 
cording to the contents of the instruction de - 
coded. io 

26. A data processing method characterized by 
comprising the steps of: 

performing arithmetic processing on data 
bit by bit on the basis of an instruction; is 

writing the result of the arithmetic into a 
1 - bit memory; and 

transferring data read out of said 1 - bit 
memory to outside, thereby performing unit 
data processing, 20 

27. A data processing method according to claim 
26, characterized in that said unit data pro- 
cessing in each of processor elements ar- 
ranged in an array, processing of one word is 25 
performed by processor elements in a column. 

and two or more words are processed in par- 
allel. 
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